IterableStore #191

rubanm · 2013-11-22T05:52:57Z

Addresses #48

I'm thinking it may be better to not mix in scala.collection.Iterable, so we can wrap the store's iterator in a Future.

johnynek · 2013-11-22T19:22:57Z

storehaus-core/src/main/scala/com/twitter/storehaus/IterableStore.scala

+ * For other stores, the iterable should ideally be backed by a stream.
+ */
+trait IterableStore[K, +V] extends ReadableStore[K, V] {
+  def iterator: Future[Iterator[(K, V)]]


I think this should be a Future[Spool[(K,V)]]. We don't want blocking on .next. It should be easy to convert an Iteratable to a Spool to get the map implementation.

rubanm · 2013-11-26T06:21:06Z

Changed to Spool[(K, V)].
Not completely sure if my implementation is correct.

johnynek · 2013-11-26T06:24:09Z

storehaus-core/src/main/scala/com/twitter/storehaus/IterableStore.scala

+ */
+trait IterableStore[K, V] extends ReadableStore[K, V] {
+
+  protected def iteratorToSpool(it: Iterator[(K, V)]): Future[Spool[(K, V)]] = {


can these two methods move to the companion object?

rubanm · 2013-11-26T07:01:17Z

Thanks for the further feedback. Updated the diff.

ryanlecompte · 2013-11-26T07:22:19Z

storehaus-core/src/main/scala/com/twitter/storehaus/IterableStore.scala

+    s
+  }
+
+  protected def fillSpool[K, V](it: Iterator[(K, V)], s: Promise[Spool[(K, V)]]): Unit = {


This will actually eagerly evaluate the iterator, which I don't think you want. See this:

scala> def fillSpool[K, V](it: Iterator[(K, V)], s: Promise[Spool[(K, V)]]): Unit = { | if (it.isEmpty) { | s() = Return(Spool.empty[(K, V)]) | } else { | val next = new Promise[Spool[(K, V)]] | s() = Return(it.next *:: next) | fillSpool(it, next) | } | } fillSpool: [K, V](it: Iterator[(K, V)], s: com.twitter.util.Promise[com.twitter.concurrent.Spool[(K, V)]])Unit scala> val s = new Promise[Spool[(Int,Int)]] s: com.twitter.util.Promise[com.twitter.concurrent.Spool[(Int, Int)]] = Promise@793497799(state=Waiting(null,List())) scala> val it = Seq(1,2,3,4,5).iterator.map { i => println("hi!"); (i, i*2) } it: Iterator[(Int, Int)] = non-empty iterator scala> fillSpool(it, s) hi! hi! hi! hi! hi!

I think you want to lazily evaluate the iterator.

Should we just use the existing SpoolSource to create these?

You're right. We'd want lazy evaluation here. The following did not work for me either:

def iterator2Spool[K, V](it: Iterator[(K, V)]): Spool[(K, V)] = { if (it.isEmpty) { Spool.empty[(K, V)] } else { Spool.cons(it.next, Future.value(iterator2Spool(it))) } }

Maybe I'm missing something about Spool does deferred tails.

Do we actually want a Spool here? It seems to me that a Spool is useful when you want to essentially have a Stream whose tail is evaluated not lazily, but rather asynchronously (still eagerly, but just asynchronously). I guess we could see how it's used in Finagle.

Yep, seems like we'll need to create a lazy version of Spool (doable I think), or maybe use Stream[Future[(K, V)]] instead.

@johnynek thoughts?

Thinking about this more, it might be best to provide an api that lets you provide a batchSize, and just return an Iterable[Future[(K, V)]]. The other possibility would be to supply a new Cursor abstraction, which lets you specify a batch size.

class Cursor[A] { def fetch(batchSize: Int): Future[Seq[A]] }

thoughts:

lazy or bust.

I think a type like:

trait FStream[+T] { def head: Future[T] def tail: FStream[T] }

is what we want (I think a StreamT monad transformer on Future is what is happening here, I think scalaz has this). I thought this was basically Spool.

If you have Iterable[Future[T]] you can do. .toList.size which implies you can get the size without waiting, which is false. We don't know the cardinality of the keys without an Await.

I think we should stay concrete while we abstract: what will we implement? I would like to use this paging to handle things like a ReadableStore[K, FStream[V]] where K is a user, and V are the followers: since some users have millions of followers, we can't assume we can fit the response in on unpaged value. What other use cases do people have?

Are you going to permit people to do out of order requests? What will you do if I do

// assume stream: FStream[A] val tail = stream.tail tail.tail.head tail.head

? Does the iterable abstraction really support that?

I think batching is pretty nice. You might be able to refactor FStream to support batching by specifying a headsize with your tail.

.toList.size would behave the same as it usually does with infinite iterables (ie it would never end). The semantics would definitely be weird, but the right way for it to work would be to just start sending Future.exceptions after a certain point. I don't think this is necessarily the right way--you might want to convert it to a Spool if someone tries to do a "foreach"-style operation so that it will be correct.

Hey gang: Spool is supposed to have deferred behavior, but we found a few bugs in the *:: syntax that were forcing the tail. I'll see if I can "force" a util-core release including the fix. #rimshot

Ok, @evnm released util-core 6.11.0, which includes the Spool fix: twitter/util@12faddb

softprops · 2013-11-30T19:41:47Z

@mosesn +10 for a Cursor interface.

I'm waiting for this to go to develop before I pitched the idea for a way to iterate the keys of a store, often times a less expensive operation. You can now do this in redis with the new scan flavored commands in 2.8 as a relatively cheap operation with the one requirement that you pass in what could be considered a cursor. In order to implement this I'd need the cursor to be parameterized in terms of its input so I'd want to change your suggestions to something like

trait Cursor[S, A] {
  def fetch(state: S): Future[Seq[A]]
}

or even just a function alias defined in the iterable store companion object

type Cursor[S, A] = (S => Future[Seq[A]])

I'm probably getting a head of myself by I thought it would be worth bringing up as other types of iterable ( by value or by key) stores may have the same requirements.

rubanm · 2013-12-03T06:32:59Z

Thanks for the ideas. My initial thinking was that IterableStore can be used for iterating over key-values, keys, etc. and a separate Paging store can be used for paginated reads. However, now I realize Iterable is just a special case of Paging store (i.e. batch size 1).

Given this, I think a cursor makes sense and can be applied across these different use cases. To me, it seems like the fetch call would also need to return an updated cursor to be used for the subsequent fetch. Something like:

trait Cursor[S, A] {
  def fetch(state: S): Future[(S, Seq[A])]
}

(Let me know if you guys think there's a better way to capture the updated cursor.)

For stores like redis, S could be an integer representing the cursor in scan commands.
For mysql, S could be (startingRowNum, batchSize) to be applied as LIMIT to the queries.
For cache stores like MapStore etc, S can just be the backing map's iterator (or (iterator, batchSize) for the paging case).

Does this adequately cover the use cases we foresee as of now?

softprops · 2013-12-03T15:28:48Z

This was something that occurred to me when looking at the instrumentation branch. The *Store interfaces are general enough to be remixed into combinator features so that store combinators can be implemented in terms of other stores.

trait Cursor[S,A] extends ReadableStore[S, Seq[A]]

S denotes cursor state and A denotes the value ( or potentially key ) type of the store.

Want the next page?

cursor.get(state)

Want to fetch multiple pages at once?

cursor.multiGet(states)

avibryant · 2013-12-03T17:59:16Z

It would actually be this, right?

trait Cursor[S,A] extends ReadableStore[S, (S,Seq[A])]

And yes, that feels right to me - it's similar to a pivot combinator...

softprops · 2013-12-03T18:02:24Z

ah yes. a client would need access to S to get the next Cursor so that makes sense.

johnynek · 2013-12-03T18:28:49Z

+1

mosesn · 2013-12-04T00:14:53Z

+1

@avibryant for my edification, do you have any links to more information on the pivot combinator?

avibryant · 2013-12-04T00:21:30Z

I was referring to https://github.com/twitter/storehaus/blob/develop/storehaus-core/src/main/scala/com/twitter/storehaus/PivotedReadableStore.scala ... not exactly the same thing, but similar in that it is transforming the key space in a pretty significant way.

rubanm · 2013-12-04T07:26:40Z

Sounds good. One remaining point.. how do we denote start state (or empty cursor) for the first get?

One way is to force Option[S] and have None denote the start state. Alternatively, we can let the implementation define it.

abstract class CursorState[S] {
  def currentState: S
  def startState: S
}

trait Cursor[S,A] extends ReadableStore[CursorState[S], (CursorState[S],Seq[A])]

I have no strong preference, but the latter seems more flexible?

Sorry for the churn. This did not occur to me earlier.

mosesn · 2013-12-04T14:00:42Z

We could require a zero type class at construction time.
On Dec 4, 2013 2:26 AM, "Ruban Monu" [email protected] wrote:

Sounds good. One remaining point.. how do we denote start state (or empty
cursor) for the first get?

One way is to force Option[S] and have None denote the start state.
Alternatively, we can let the implementation define it.

abstract class CursorState[S] {
def currentState: S
def startState: S}
trait Cursor[S,A] extends ReadableStore[CursorState[S], (CursorState[S],Seq[A])]

I have no strong preference, but the latter seems more flexible?

Sorry for the churn. This did not occur to me earlier.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/191#issuecomment-29784655
.

softprops · 2013-12-04T14:56:19Z

storehaus-core/src/main/scala/com/twitter/storehaus/IterableStore.scala

+ */
+trait IterableStore[K, V] extends ReadableStore[K, V] {
+
+  def items: Future[Spool[(K, V)]]


Just a thought. entries may be a better name for this. From a user perspective Stores are a lot like java.util.Maps in which the equiv name would be entries. I totally get items though for those you are from a python background and have a dict in mind when they think of stores.

softprops · 2013-12-04T15:48:31Z

If you wanted a split between pagable stores and iterable stores, for the pagable flavor you may not need the first type arg to be any more structured/complex than an abstract type, S. For mysql this may just be an offset/limit tuple for instance

here's a rough sketch

case class MySQLCursor[T](query: MySqlQuery[T]) extends Cursor[(Long, Long), T] {
   def get(state: (Long, Long)) = {
     val (offset, limit) = state
     query.offset(offset).limit(limit).fetch().map { values =>
         ((offset + limit, limit), values)
      }
   }
}

val cursor = MySQLCursor(mysqlQuery)
def all(from: (Long, Long)) = cursor.get(from).map {
  case (next, values) = values ++ all(next)
}
// read 100 at a time
all((0, 100))

iterable stores would need more structure like an initial state, and an end state to indicate there are no more results. A type class for this would probably be best

mosesn · 2013-12-09T01:43:56Z

This looks a little like the State monad actually.

johnynek · 2013-12-10T07:15:51Z

@mosesn if we pull too hard on that we will see that all of storehaus is just kleisli composition on the Future[Option[_]] monad, and these generalizations are just changing the monad. The read thing is T => M[U] where there is a Monad[M].

That said, I think there is value in making these combinators a bit more concrete for people

softprops · 2013-12-10T14:46:27Z

+1 @johnynek I think if you look hard enough everything reveals it self to be some form of monad :) The judgment of whether or not to make that explicit is up to the man behind the curtain. Sometimes the audience will give just as much applause during the show with or without that knowledge.

mosesn · 2013-12-11T00:14:11Z

@johnynek lol ok

stuhood · 2014-01-13T20:06:40Z

I don't think that an independent Cursor type is necessary. A "pageable" store is still iterable, you're just iterating over batches of entries. One approach that would be somewhat symmetrical to the WriteableStore.{put, multiPut} split would be:

trait IterableStore[K,V] {
  def getAll: Future[Spool[(K,V)]] =
    multiGetAll.flatMap { batchSpool =>
       batchSpool.flatMap(_.toSpool)
    }
  def multiGetAll: Future[Spool[Seq[(K,V)]]] =
    getAll.flatMap { spool =>
      spool.map(Seq(_))
    }
}

Such that the per-item Spool can be defined in terms of either batches or individual entries.

stuhood · 2014-01-14T03:54:03Z

@softprops @avibryant On second thought, it might be clearer to make the trait for "Splittable" different from the trait for "Iterable"... but I still don't think an explicit "Cursor" type is necessary.

The sketch above (#191 (comment)) is sufficient for Iterable, and is easily implemented in memory by any hash map.

But there are extra considerations in the Splittable case that make it more difficult for an implementer:

consumers might want to execute the split method in one process, and then distribute the splits/ranges to multiple processes for execution (in which case K would need to be serializable, but we can leave that to the consumer)
would need to be able to generate a Store bounded by two keys/Cursors (which a JMap/Map will not let you do, but which SortedMap will)

So a potential Splittable trait might be:

trait Splittable[K,V] {
  type Range = (K,K)
  /** Generates a sequence of Ranges that together include this entire Store. */
  def split(...): Future[Seq[Range]]
  /** Returns a view of a sub-range of this Store. Analogous to Java's SortedMap.subMap method. */
  def subStore(range: Range): this.type
}

softprops · 2014-01-14T13:41:28Z

Man, so much to catch up on lately.

trait IterableStore[K,V] {
def getAll: Future[Spool[(K,V)]] =
multiGetAll.flatMap { batchSpool =>
batchSpool.flatMap(_.toSpool)
}

If there's a way to design this so that IterableStore is itself a Store I'd be happier. Maintaining a Store identity means it could have all the benefits the other ( and future ) and Store combinators provide for free. I still think you can make IterableStore the product of some combinator iterator? on a ReadableStore that wraps the store with some extra machinery. I wish I had a bit more time to sketch out a proof of concept here but I think its doable.

trait Splittable[K,V] {
type Range = (K,K)

This is starting to sound a lot like the (Un)Pivoted family of Stores. In a similar manner, the combinator for these takes something like your split function would define except that its on a per-key/value basis. I don't think Range is sufficient for all cases of getting a slab of a store, though it may in cases like mysql range queries. I'm mainly thinking of redis scan which goes back to the cursor type thing before. I'd really look back though the (Un)Pivoted catalog before adding a new type that does something very similar but doesn't produce a new Store.

As a design goal, it would be nice to think of storehaus as a small set of primitives with a small surface area interface and sets of combinators for adding behavior while producing new stores which maintain the primitives interface so that the creation of stores can easily be a small matter of composition. I understand not all square pegs fit in round holes so I can see some leeway (preferable small), but it would something nice to keep in mind. As a user of an instance of a Store, I should mainly only have to think about the get/put operations + the K/V types. I think @sritchie and @johnynek originally had that idea backed into the design. ( and their both really good designers! )

stuhood · 2014-01-14T18:52:15Z

If there's a way to design this so that IterableStore is itself a Store I'd be happier.

I was thinking it was just a trait without a self-type, similar to WritableStore. But yea, I don't see any reason why it couldn't self-type Store.

I still think you can make IterableStore the product of some combinator iterator? on a ReadableStore that
wraps the store with some extra machinery.

Maybe, but I think the collection.Iterable->Spool factory function alone probably makes it sufficiently easy to implement the IterableStore trait manually.

I'm mainly thinking of redis scan which goes back to the cursor type thing before.

Yea, that's a good point. So the Cursor type does need to be a generic parameter to Splittable.

trait SplittableStore[K,V,Cursor] {
  type Range = (Cursor,Cursor)
  ...
}

rubanm · 2014-01-14T19:37:19Z

I agree with @softprops that we should try and have IterableStore itself as a store.
It probably does not align with how many backing stores allow iterating over keyspace, e.g. redis has get, put, and scan.
But -- it plays really well with how our store combinators work. So +1 to that.

My 2 cents building on the ideas proposed so far..

Store approach:

trait IterableStore[S,A] extends ReadableStore[S, (S,Seq[A])] 
// S is the cursor type
// A can be K or V or (K, V) depending on what you want to iterate over

trait SplittableStore[R, K, V] extends ReadableStore[R, ReadableStore[K, V]] // or Spool[(K, V)]
// R is the range, can be a cursor pair, or something else if applicable
// get(r: R) will be analogous to subStore or subMap

Or, combinator approach:

class IterableStore[S, K, V, A](store: ReadableStore[K, V])
  (implicit inj: Injection[(K, V), A])
  extends ReadableStore[S, (S, Seq[A])]

class SplittableStore[R, K, V](store: ReadableStore[K, V])
  extends ReadableStore[R, ReadableStore[K, V]) // or Spool[(K, V)]

stuhood · 2014-01-14T20:28:05Z

Argh, sorry. Dumb UI / dumb user.

stuhood · 2014-01-14T20:38:15Z

@rubanm I don't think requiring a public Cursor type in IterableStore is very helpful; it:

makes an unnecessary implementation detail of the Store public
makes things more complicated in the simple case of "turn something that is collection.Iterable into an IterableStore"
doesn't make things significantly simpler for the implementer of a batched IterableStore. see https://gist.github.com/stuhood/8425008
Makes it impossible to have an IterableStore which is not iterable from a random position

stuhood · 2014-01-14T20:49:07Z

... but +1 for the proposed SplittableStore.

Merge from twitter/storehaus

rubanm · 2014-01-20T19:19:04Z

@stuhood Hmm valid points. So we let the stores maintain cursor state. I think this also adequately covers the paging case as long as you are not looking for random reads.

For random reads (like mysql range or limit/offset queries etc), we can use a SplittableStore. Alternatively, there is a QueryableStore in the works here #205

I've updated the pull req.

Also, thanks for the Spool fix.

stuhood · 2014-01-20T19:42:41Z

storehaus-core/src/main/scala/com/twitter/storehaus/IterableStore.scala

+  def fromMap[K, V](m: Map[K, V]): IterableStore[K, V] = new MapStore(m)
+
+  /** Helper method to convert Iterator to Spool. */
+  def iteratorToSpool[K, V](it: Iterator[(K, V)]): Future[Spool[(K, V)]] = {


It should be possible to do this without the fillSpool method:

def iteratorToSpool[K, V](it: Iterator[(K, V)]): Spool[(K, V)] = if (it.hasNext) it.next *:: Future.value(iteratorToSpool(it)) else Spool.empty

stuhood · 2014-01-20T19:44:57Z

Thanks, looks good! Might be worthwhile to implement IterableStore on a few of storehaus' included Stores, just to make sure we're not missing anything. Or I'd be happy to do that in a separate pull?

EDIT: Whoops: missed the MapStore update in there. Looks good with the exception of dropping fillSpool.

johnynek · 2014-01-20T20:20:33Z

storehaus-core/src/main/scala/com/twitter/storehaus/MapStore.scala

@@ -23,6 +23,9 @@ import com.twitter.util.Future
 *  @author Oscar Boykin
 *  @author Sam Ritchie
 */
-class MapStore[K, +V](val backingStore: Map[K, V] = Map[K, V]()) extends ReadableStore[K, V] {
+class MapStore[K, V](val backingStore: Map[K, V] = Map[K, V]()) extends ReadableStore[K, V]


why did we lose +V?

rubanm · 2014-01-20T21:04:38Z

Fixed variance and dropped fillSpool. Thanks.

IterableStore

add iterable store trait

5d897c6

johnynek reviewed Nov 22, 2013
View reviewed changes

change to spool

0bd039b

johnynek reviewed Nov 26, 2013
View reviewed changes

address further comments

4421685

ryanlecompte reviewed Nov 26, 2013
View reviewed changes

softprops reviewed Dec 4, 2013
View reviewed changes

stuhood closed this Jan 14, 2014

stuhood reopened this Jan 14, 2014

rubanm and others added 4 commits January 20, 2014 09:57

Merge pull request #15 from twitter/develop

b8cdbb5

Merge from twitter/storehaus

Merge branch 'develop' into feature/iterable_store

d64bf25

update util-core version

ac6d39b

getAll, multiGetAll

6e101fe

stuhood reviewed Jan 20, 2014
View reviewed changes

johnynek reviewed Jan 20, 2014
View reviewed changes

rubanm added 2 commits January 20, 2014 12:48

fix variance

c024f20

cleaner iteratorToSpool

54ab8b3

johnynek added a commit that referenced this pull request Jan 20, 2014

Merge pull request #191 from rubanm/feature/iterable_store

c87ca02

IterableStore

johnynek merged commit c87ca02 into twitter:develop Jan 20, 2014

rubanm mentioned this pull request Apr 25, 2014

creating pull request for storehaus-cassandra #211

Open

IterableStore #191

IterableStore #191

Conversation

rubanm commented Nov 22, 2013

Choose a reason for hiding this comment

rubanm commented Nov 26, 2013

Choose a reason for hiding this comment

rubanm commented Nov 26, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

softprops commented Nov 30, 2013

rubanm commented Dec 3, 2013

softprops commented Dec 3, 2013

avibryant commented Dec 3, 2013

softprops commented Dec 3, 2013

johnynek commented Dec 3, 2013

mosesn commented Dec 4, 2013

avibryant commented Dec 4, 2013

rubanm commented Dec 4, 2013

mosesn commented Dec 4, 2013

Choose a reason for hiding this comment

softprops commented Dec 4, 2013

mosesn commented Dec 9, 2013

johnynek commented Dec 10, 2013

softprops commented Dec 10, 2013

mosesn commented Dec 11, 2013

stuhood commented Jan 13, 2014

stuhood commented Jan 14, 2014

softprops commented Jan 14, 2014

stuhood commented Jan 14, 2014

rubanm commented Jan 14, 2014

stuhood commented Jan 14, 2014

stuhood commented Jan 14, 2014

stuhood commented Jan 14, 2014

rubanm commented Jan 20, 2014

Choose a reason for hiding this comment

stuhood commented Jan 20, 2014

Choose a reason for hiding this comment

rubanm commented Jan 20, 2014